Skip to content

HIVE-29649: Iceberg: Support Parquet DECIMAL_64 vectorization#6527

Merged
deniskuzZ merged 5 commits into
apache:masterfrom
deniskuzZ:HIVE-29649
Jun 23, 2026
Merged

HIVE-29649: Iceberg: Support Parquet DECIMAL_64 vectorization#6527
deniskuzZ merged 5 commits into
apache:masterfrom
deniskuzZ:HIVE-29649

Conversation

@deniskuzZ

@deniskuzZ deniskuzZ commented Jun 5, 2026

Copy link
Copy Markdown
Member

What changes were proposed in this pull request?

Adds support for Parquet DECIMAL_64 vectorization

Why are the changes needed?

Performace

Does this PR introduce any user-facing change?

No

How was this patch tested?

parquet_decimal64.q, vectorized_iceberg_read_multitable.q

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR aims to enable and validate Parquet DECIMAL_64 vectorization (including for Iceberg tables) by advertising DECIMAL_64 feature support from Parquet/Iceberg input formats, adding a Decimal64 read path to the Parquet vectorized reader, and extending test coverage and golden outputs to assert the new vectorization behavior.

Changes:

  • Advertise DECIMAL_64 as a supported vectorization feature for Parquet (and Iceberg when enabled via table properties).
  • Add a Decimal64ColumnVector read path to the Parquet vectorized primitive column reader (including dictionary decode support).
  • Add/adjust LLAP query tests and Java unit tests to exercise DECIMAL_64 vectorization and update expected outputs.

Reviewed changes

Copilot reviewed 14 out of 14 changed files in this pull request and generated 6 comments.

Show a summary per file
File Description
ql/src/test/results/clientpositive/llap/parquet_decimal64.q.out New golden output validating Parquet DECIMAL_64 vectorization plan + results.
ql/src/test/queries/clientpositive/parquet_decimal64.q New query to assert Parquet vectorized reader engages DECIMAL_64.
ql/src/test/org/apache/hadoop/hive/ql/io/parquet/VectorizedColumnReaderTestBase.java Adds wiring for physical variations + a Decimal64 read verification helper.
ql/src/test/org/apache/hadoop/hive/ql/io/parquet/TestVectorizedDictionaryEncodingColumnReader.java Adds a unit test entry point for Decimal64 reads (dictionary).
ql/src/test/org/apache/hadoop/hive/ql/io/parquet/TestVectorizedColumnReader.java Adds a unit test entry point for Decimal64 reads (non-dictionary).
ql/src/java/org/apache/hadoop/hive/ql/io/parquet/vector/VectorizedPrimitiveColumnReader.java Implements Parquet Decimal64 read + dictionary decode logic.
ql/src/java/org/apache/hadoop/hive/ql/io/parquet/MapredParquetInputFormat.java Advertises DECIMAL_64 support for Parquet vectorization.
iceberg/iceberg-handler/src/test/results/positive/llap/vectorized_iceberg_read_parquet.q.out Updates plan/output expectations to show DECIMAL_64 in use for Iceberg Parquet.
iceberg/iceberg-handler/src/test/results/positive/llap/vectorized_iceberg_read_multitable.q.out Updates expected output (removes now-unneeded ORC-only toggling sections).
iceberg/iceberg-handler/src/test/results/positive/llap/vectorized_iceberg_read_mixed.q.out Updates plan/output expectations for mixed ORC/Parquet Iceberg vectorization.
iceberg/iceberg-handler/src/test/queries/positive/vectorized_iceberg_read_multitable.q Removes ORC-only toggling queries that are no longer required.
iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/hive/HiveIcebergMetaHook.java Removes ORC-only parameter propagation used for prior vectorization gating.
iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/hive/HiveIcebergInputFormat.java Enables advertising DECIMAL_64 for Iceberg when table property is enabled (regardless of ORC/Parquet).
iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/hive/BaseHiveIcebergMetaHook.java Removes ORC-only parameter constant + helper methods used for vectorization gating.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 15 out of 15 changed files in this pull request and generated 1 comment.

@deniskuzZ deniskuzZ force-pushed the HIVE-29649 branch 2 times, most recently from e474e0b to 49ec3e4 Compare June 6, 2026 17:31
@deniskuzZ deniskuzZ requested a review from Copilot June 6, 2026 17:31

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 105 out of 105 changed files in this pull request and generated no new comments.

Comments suppressed due to low confidence (1)

storage-api/src/java/org/apache/hadoop/hive/ql/exec/vector/Decimal64ColumnVector.java:118

  • enforceScaleAndSet writes vector[elementNum] on success but never clears isNull[elementNum]. If the caller left a stale NULL flag (or initially marked the entry null before calling set), the value will be stored but still treated as NULL downstream. Since set(int, ...) is advertised as a fast path that assumes the isNull entry is already set, it should explicitly set isNull[elementNum] = false when the value is in range.

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 105 out of 105 changed files in this pull request and generated 1 comment.

@difin difin left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@sonarqubecloud

Copy link
Copy Markdown

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 106 out of 106 changed files in this pull request and generated no new comments.

Comments suppressed due to low confidence (5)

ql/src/java/org/apache/hadoop/hive/ql/io/parquet/vector/VectorizedPrimitiveColumnReader.java:1

  • This uses Java pattern matching for instanceof (column instanceof Decimal64ColumnVector dec64), which will not compile on repositories building with Java < 16 (Hive commonly targets Java 8/11). Replace with a classic instanceof check followed by an explicit cast and separate variable declaration.
    ql/src/java/org/apache/hadoop/hive/ql/io/parquet/vector/VectorizedPrimitiveColumnReader.java:1
  • This also uses Java pattern matching with a bound variable in instanceof (hiveType instanceof DecimalTypeInfo hiveDti), which is not supported before Java 16. Use (hiveType instanceof DecimalTypeInfo) ? (DecimalTypeInfo) hiveType : getDecimalTypeInfo() to keep compatibility with older Java targets.
    ql/src/test/org/apache/hadoop/hive/ql/io/parquet/VectorizedColumnReaderTestBase.java:1
  • Using a fixed filename under java.io.tmpdir can still collide when tests are run concurrently (e.g., parallel surefire/failsafe, multiple forks, or repeated runs on the same agent), causing flaky failures. Prefer generating a unique per-test path (e.g., UUID-based) and/or using the test framework’s temp directory utilities.
    ql/src/test/org/apache/hadoop/hive/ql/io/parquet/VectorizedColumnReaderTestBase.java:1
  • BigInteger.toByteArray() can return a byte array longer than the requested fixed length (e.g., requiring an extra leading sign byte), which will make len - minimal.length negative and throw ArrayIndexOutOfBoundsException. Add a guard to handle minimal.length > len (either trim a leading sign byte when safe, or fail with a clear assertion since this is test code).
    ql/src/java/org/apache/hadoop/hive/ql/io/parquet/MapredParquetInputFormat.java:1
  • This allocates a new array on every call. Consider returning a private static final VectorizedSupport.Support[] constant to avoid repeated allocations on hot paths.

@deniskuzZ deniskuzZ merged commit 31f8a1c into apache:master Jun 23, 2026
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants